Efficient String Matching with k Mismatches
نویسندگان
چکیده
Given a text of length n, a pattern of length m and an integer k, we present an algorithm for finding all occurrences of the pattern in the text, each with at most k mismatches. The algorithm runs in 0{k[mlQgTn + n) time. 1. INTEODUCTION The problem of string matching xuith k misTnatchss is defined as follows. Suppose we are given a text of length n , a pattern of length m and an integer k . Find all occurrences of the pattern in the text with at most k location in which the text and the pattern have different symbols. Note that the case fc is the extensively studied string matching problem. Let us mention a few notable algorithms for the string matching problem: linear time serial algorithms i_BM], [GS], [KMP], [KR] (a randomized algorithm) and [Y], parallel algorithms [G] and [V]. The problem has a strong pragmatical flavor. In practice, we often need to analize situations where the data is not completely reliable. SpecLfically, consider a situation where the strings which are the input for our problem contain errors and we still need to find cdl possible occurrences of the pattern in the text as m reality. Assuming some bound on the number of errors would clearly imply our problem. We present an algorithm for string matching with k m.i3matches which runs in time 0{k[mlQgm. + n)) on a random-access-machine (R^VI) [AHU]. After all the results m the present paper have been achieved, A. Slisenko has brought to our attention the paper [I] in which another algorithm for the 1. Department of Computer Science, School of Matherr.aticai Sciences, Tei Av.v university 2. Departrr.ent of Computer Science, Courant Institute of Mathematicai Sciences, N'ew York University and (present address) Department of Com.puter Science, School of Mathematical Sciences, Tei Aviv University. The research of this author was sutiTiortad by XSF grant XS?DCa-8318874 and OX?, gran: >r0014-85-K-OO46.
منابع مشابه
A Parallel Algorithm for Fixed-Length Approximate String-Matching with k-mismatches
This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with kmismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in comp...
متن کاملReduced Nondeterministic Finite Automata for Approximate String Matching
We will show how to reduce the number of states of nondeterministic nite automata for approximate string matching with k mismatches and nondeterministic nite automata for approximate string matching with k differences in the case when we do not need to know how many mismatches or di erences are in the found string. Also we will show impact of this reduction on Shift-Or based algorithms.
متن کاملOn string matching with k mismatches
In this paper we consider several variants of the pattern matching problem. In particular, we investigate the following problems: 1) Pattern matching with k mismatches; 2) Approximate counting of mismatches; and 3) Pattern matching with mismatches. The distance metric used is the Hamming distance. We present some novel algorithms and techniques for solving these problems. Both deterministic and...
متن کاملApproximate String Matching by Finite Automata
Abs t r ac t . Approximate string matching is a sequential problem and therefore it is possible to solve it using finite automata. A nondeterministic finite automaton is constructed for string matching with k mismatches. It is shown, how "dynamic programming" and "shift-and" based algorithms simulate this nondeterministic finite automaton. The corresponding deterministic finite automaton have O...
متن کاملSpace Complexity of Linear Time Approximate String Matching
Approximate string matching is a sequential problem and therefore it is possible to solve it using nite automata. Nondeterministic nite automata are constructed for string matching with k mismatches and k di erences. The corresponding deterministic nite automata are base for approximate string matching in linear time. Then the space complexity of both types of deterministic automata is calculat...
متن کاملApproximate Boyer-Moore String Matching
The Boyer-Moore idea applied in exact string matching is generalized to approximate string matching. Two versions of the problem are considered. The k mismatches problem is to find all approximate occurrences of a pattern string (length m) in a text string (length n) with at most k mismatches. Our generalized Boyer-Moore algorithm is shown (under a mild independence assumption) to solve the pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Theor. Comput. Sci.
دوره 43 شماره
صفحات -
تاریخ انتشار 1986